Skip to content

Conversation

@slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Mar 28, 2024

Description of changes

I started from issue #1059 and PR #2019 and am restarting here. From closed PR #2019 I preserved a few things:

  • If avgflag /= 'I', time_bounds is present and time = mid of time_bounds
  • Some long_name changes
  • Treating avgflag as a tape (not field) trait for 'I' and 'L' tapes

In this PR we also intend to split all history tapes into instantaneous and non-instantaneous. The CAM group accomplished this in PR ESCOMP/CAM#903 and we plan to maintain consistency in the appearance of CLM and CAM history files. Possibly helpful in CAM's PRs:

The work will need to be repeated in

UPDATE
Due to higher priority, I completed the task of changing "time" to equal the middle of time_bounds in
#2838
ESCOMP/MOSART#106
ESCOMP/RTM#39

Specific notes

Contributors other than yourself, if any:
@ekluzek @billsacks

CTSM Issues Fixed (include github issue #):
Fixes #1059
Resolves ESCOMP/RTM#32
Resolves ESCOMP/MOSART#52

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)?
We intend to split all history tapes into instantaneous and non-instantaneous.

@slevis-lmwg slevis-lmwg self-assigned this Mar 28, 2024
@slevis-lmwg slevis-lmwg added enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations labels Mar 29, 2024
@slevis-lmwg
Copy link
Contributor Author

This far I have tried one test:
ERP_D_P48x1.f10_f10_mg37.IHistClm51Bgc.izumi_nag.clm-decStart
The test completes as expected and shows different answers from dev175 only for the "time" variable.

@wwieder wwieder added this to the CESM3 Answer changing freeze milestone Jun 20, 2024
@wwieder
Copy link
Contributor

wwieder commented Oct 16, 2024

Conceptually, this seems like a good idea. Practically, this seems like a heavy lift.
I wonder how high a priority it should be for the CESM project and the CLM team?
How critical is this kind of consistency for CESM3?
@dlawrenncar and @briandobbins, could you weigh in here?
@billsacks and @phillips-ad, I think you've thought more about this too?

@phillips-ad
Copy link

I think that consistency in the component output of CESM3 is quite important from a user's standpoint. CTSM would be an outlier amongst the 4 major components if these changes were not implemented. Looking at the project page, these changes have already been put into CAM and CICE. MOM came with these settings out of the box. @slevis-lmwg has a couple of pull requests with these changes for MOSART and RTM. I am a bit unclear where CISM is on implementing this. The only component that I know of that will likely not be implementing this is WW.

@billsacks
Copy link
Member

I started to write a reply but @phillips-ad beat me to it. The critical piece of this is fixing time values for time-averaged fields so that the time is in the middle of the averaging period instead of at the end. As @phillips-ad says, it is likely to be particularly confusing if CTSM (and the river components) differs from other components in this respect. The separation of instantaneous from time-averaged fields is not a critical piece of this, but seemed like the best way to support having correct time values for both time-averaged fields and instantaneous fields.

@slevis-lmwg
Copy link
Contributor Author

I started to write a reply but @phillips-ad beat me to it. The critical piece of this is fixing time values for time-averaged fields so that the time is in the middle of the averaging period instead of at the end. As @phillips-ad says, it is likely to be particularly confusing if CTSM (and the river components) differs from other components in this respect. The separation of instantaneous from time-averaged fields is not a critical piece of this, but seemed like the best way to support having correct time values for both time-averaged fields and instantaneous fields.

@billsacks I find the distinction in priority between the "middle of averaging period" task and the "separating instantaneous and non" task very helpful, because the former will take me a few days at most, while the latter will take me a few weeks at least.

@billsacks
Copy link
Member

I find the distinction in priority between the "middle of averaging period" task and the "separating instantaneous and non" task very helpful, because the former will take me a few days at most, while the latter will take me a few weeks at least.

Thanks, that's good to know, @slevis-lmwg . I'm curious what the plan is, then, for instantaneous fields? Will you just let the time be incorrect for those fields for now?

@phillips-ad do you agree that getting time correct for the time-averaged fields is the high priority thing here even if instantaneous fields aren't handled ideally for now?

@phillips-ad
Copy link

Yes I 100% agree, getting the time set to the middle of the averaging period for time-averaged fields is the higher priority.

@samsrabin
Copy link
Member

Just wanted to add: Separating instantaneous and time-averaged fields will probably be an issue for #2061, which adds a namelist option to enable all our history fields and put them all on the h0 file. Depending on which PR comes in first, we'll have to think about how we want to handle that.

@slevis-lmwg
Copy link
Contributor Author

Just wanted to add: Separating instantaneous and time-averaged fields will probably be an issue for #2061, which adds a namelist option to enable all our history fields and put them all on the h0 file. Depending on which PR comes in first, we'll have to think about how we want to handle that.

Thank you for pointing this out @samsrabin

@slevis-lmwg
Copy link
Contributor Author

Based on the recent conversation here, I have opened a new PR #2838 that has the same starting point as this one (#2445). The new PR will not make separate instantaneous and non-instantaneous history tapes.

@slevis-lmwg slevis-lmwg changed the title Make separate instantaneous and non-inst. history tapes; for the latter set time = mid of time_bounds Make separate instantaneous and non-inst. history tapes Oct 18, 2024
@slevis-lmwg slevis-lmwg removed the priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations label Oct 18, 2024
@ekluzek
Copy link
Collaborator

ekluzek commented Oct 23, 2024

Thanks, that's good to know, @slevis-lmwg . I'm curious what the plan is, then, for instantaneous fields? Will you just let the time be incorrect for those fields for now?

@billsacks

Yes, we will let the instantaneous fields be incorrect to start with. And then a later more involved change will fix that.

I like the way that @samsrabin framed this when we discussed this. Right now "I" fields are correct, but "A" fields are incorrect. But, we care more about "A" fields (there's more of them output and they are the standard for most things) so getting them corrected at the expense of "I" fields is an improvement.

Reduce outputs from matrixcnOn tests

slevis resolved conflicts:
src/main/histFileMod.F90
@slevis-lmwg

This comment was marked as resolved.

@samsrabin
Copy link
Member

As part of work on #2838, I've added:

  • Code in python/ctsm/crop_calendars/ to handle either instantaneous or non-inst. tapes
  • RXCROPMATURITYINST and RXCROPMATURITYSKIPGENINST SystemTests, to test things when a certain tape is forced to be instantaneous.

As part of this PR, the python code can be simplified to remove the non-instantaneous tape handling (where there's an option for both). Those new SystemTests can also be removed.

Bring in several fixes for testing in the previous cesm3_0_beta03/04 tags

Fix a list of issues mostly for testing that came in with the cesm3_0_beta03 and cesm3_0_beta04 tags
for the science "chill" deadline bringing in the baseline science capabilities needed for the cesm3_0 release.

List of things:

- New polarcap grid
- Fix use_init_interp for several resolutions that had problems
- Remove clm5_1 physics option
- Newly needed surface datasets added to auto build in Makefile for mksurfdata_esmf
- Fix several individual tests that were failing
- Update to submodules to ones roughly based on cesm3_0_beta04
- Add graceful error checking to the FATES parameter file tests that will get it working in some cases
@slevis-lmwg
Copy link
Contributor Author

Try Sam R.'s comparison tool.

~samrabin/pr_2445_baseline_compare/pr_2445_baseline_compare.py -1 /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.061 tests_0701-173109de

Only one result to investigate and seems like a false DIFF:

ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file clm2.h0.1851-01.nc inst: field LIT_LIGN_Cap_vr differs

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jul 9, 2025

@samsrabin this commit earlier today fixed some RXCROP tests but it fails in these two:

FAIL RXCROPMATURITY_Lm61.f09_g17.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput RUN
FAIL RXCROPMATURITY_Lm61.f09_t232.IHistClm60BgcCropCrujra.derecho_intel.clm-cropMonthOutput RUN

These two generate this error
No files found matching patterns: ['*h2a.1852-01*.nc', '*h2a.1852-01*.nc.base']
while the tests that now pass were previously failing with the error
No files found matching patterns: ['*h2i.1852-01*.nc', '*h2i.1852-01*.nc.base']
Could we troubleshoot this together when you're available? I will put something tentative in the calendar.

UPDATE I'm reverting above-mentioned commit and submitting the tests again, in case I'm missing something:

PASS ./create_test RXCROPMATURITY_Lm61.f09_g17.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm_sci-ctsm5.3.062b -g /glade/campaign/cgd/tss/ctsm_baselines/ctsm_sci-ctsm5.3.062
PASS ./create_test RXCROPMATURITY_Lm61.f09_t232.IHistClm60BgcCropCrujra.derecho_intel.clm-cropMonthOutput -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm_sci-ctsm5.3.062b -g /glade/campaign/cgd/tss/ctsm_baselines/ctsm_sci-ctsm5.3.062
PASS ./create_test RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.062b -g /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.062
PASS ./create_test SMS_D_Ld733.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput--clm-RxCropCalsAdaptGGCMI

@slevis-lmwg
Copy link
Contributor Author

Update:
@samsrabin the above reported failures were due to setting the history extensions to h2a. I may be confused about the same tests failing previously when setting the history extensions to h2i. At this point all RXCROP tests passed (see above).

@slevis-lmwg
Copy link
Contributor Author

@phillips-ad just a heads up that I will merge this PR this week (as early as today). Let me know if you have a documentation file that you would like me to update accordingly.

@phillips-ad
Copy link

@phillips-ad just a heads up that I will merge this PR this week (as early as today). Let me know if you have a documentation file that you would like me to update accordingly.

Thanks again for doing all of this @slevis-lmwg! The time change documentation is currently here; you should have editing permissions. This text will eventually be posted on the CESM3 webpage to document the output changes for every component.

@slevis-lmwg slevis-lmwg merged commit 400bfa0 into ESCOMP:master Jul 9, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from In progress - master to Done (non release/external) in CTSM: Upcoming tags Jul 9, 2025
@slevis-lmwg slevis-lmwg deleted the sep_hXi_hXa_tapes_iss1059 branch July 9, 2025 17:48
@samsrabin
Copy link
Member

@slevis-lmwg

Try Sam R.'s comparison tool.

~samrabin/pr_2445_baseline_compare/pr_2445_baseline_compare.py -1 /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.061 tests_0701-173109de

Only one result to investigate and seems like a false DIFF:

ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file clm2.h0.1851-01.nc inst: field LIT_LIGN_Cap_vr differs

I'm seeing more issues than that with that file:

❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst: time long_name incorrect: expected 'time at end of time step' but got 'time at exact middle of time_bounds'
❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst: Unexpectedly has
 time_bounds
❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst field mcdate long_name expected ending 'at end of time step'; got 'current date (YYYYMMDD) at end of time_bounds'
❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst field mcsec long_name expected ending 'at end of time step'; got 'current seconds of current date at end of time_bounds'
❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst field mdcur long_name expected ending 'at end of time step'; got 'current day (from base day) at end of time_bounds'
❌ ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings file mosart.h0.1851-01.nc inst field mscur long_name expected ending 'at end of time step'; got 'current seconds of current day at end of time_bounds'

@samsrabin
Copy link
Member

samsrabin commented Jul 9, 2025

Currently doing pr_2445_baseline_compare.py for the ctsm5.3.061 vs ctsm5.3.062 baselines; will contact you when I have more info. but it's looking like there are problems with the MOSART and RTM implementations of the a/i split.

@slevis-lmwg
Copy link
Contributor Author

Currently doing pr_2445_baseline_compare.py for the ctsm5.3.061 vs ctsm5.3.062 baselines; will contact you when I have more info. but it's looking like there are problems with the MOSART and RTM implementations of the a/i split.

I see what you mean. I saw these and took them to be expected, but I now see that mosart (and presumably rtm) hXi files end up with the same metadata as hXa files. I will open issues in mosart (and rtm after I've checked rtm, too).

@samsrabin
Copy link
Member

samsrabin commented Jul 9, 2025

Log is at /glade/u/home/samrabin/pr_2445_baseline_compare/61_vs_62.log. In addition to the aforementioned MOSART and RTM issues, I'm seeing what appear to be real diffs in the clm2.h0(i).1851-01.nc file for the following tests:

  • ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings
  • REP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings
❌ ... field CWDC_Cap_vr differs
❌ ... field LIT_CELC_Cap_vr differs
❌ ... field LIT_LIGC_Cap_vr differs
❌ ... field SOM_ACTC_Cap_vr differs
❌ ... field SOM_PASC_Cap_vr differs

(Note that the output in that log file is for the -1 version of the command, so it only shows that first line.)

Also, these expected failures:
⚠️ ERP_P256x2_Ld30.f45_f45_mg37.I2000Clm60FatesRs.derecho_intel.clm-mimicsFatesCold: No files found in baseline
⚠️ LII2FINIDATAREAS_D_P256x2_Ld1.f09_t232.I1850Clm60BgcCrop.derecho_intel.clm-default: No files found in baseline
⚠️ LII2FINIDATAREAS_D_P256x2_Ld1.f09_t232.I1850Clm60BgcCrop.derecho_intel.clm-default--clm-matrixcnOn_ignore_warnings: No files found in baseline
⚠️ SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.derecho_nvhpc.clm-crop: No files found in baseline

And these expected no-outputs:
⚠️ FUNITCTSM_P1x1.f10_f10_mg37.I2000Clm50Sp.derecho_intel: No files found in baseline
⚠️ PFS_Ld10_PS.f19_g17.I2000Clm50BgcCrop.derecho_intel: No files found in baseline

@samsrabin
Copy link
Member

I think it's actually expected that those two tests should fail here. From the expected failures file:

  <test name="ERP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings">
    <phase name="BASELINE">
      <status>FAIL</status>
      <issue>#2619</issue>
      <comment>This failure relates to the following REP failure.</comment>
    </phase>
  </test>
  <test name="REP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings">
    <phase name="COMPARE_base_rep2">
      <status>FAIL</status>
      <issue>#2619</issue>
      <comment>This failure relates to the preceding ERP failure.</comment>
    </phase>
  </test>
  <test name="REP_P64x2_Ld396.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings">
    <phase name="BASELINE">
      <status>FAIL</status>
      <issue>#2619</issue>
      <comment>This failure relates to the preceding ERP failure.</comment>
    </phase>
  </test>

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Jul 9, 2025

This PR took a while due to many competing priorities, but also due to the need for significant and somewhat subtle code changes. Now that it's done, I'd like to say thank you to various people who helped along the way and I hope I'm not forgetting anyone:
@billsacks @ekluzek @samsrabin @phillips-ad

@billsacks
Copy link
Member

Thank YOU very much, @slevis-lmwg ! I haven't followed all of the challenges and work needed to get this working, but I really appreciate all of your effort on this!!

🚀🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement new capability or improved behavior of existing capability PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations test: rivers Test RTM/MOSART/mizuRoute before merging

Projects

Status: Done (non release/external)
Status: Done

6 participants